home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- Internet Engineering Task Force Audio-Video Transport Working Group
- INTERNET-DRAFT H. Schulzrinne
- AT&T Bell Laboratories
- July 14, 1993
- Expires: 10/01/93
-
-
- Media Encodings
-
-
-
- Status of this Memo
-
-
- This document is an Internet Draft. Internet Drafts are working documents
- of the Internet Engineering Task Force (IETF), its Areas, and its Working
- Groups. Note that other groups may also distribute working documents as
- Internet Drafts.
-
- Internet Drafts are draft documents valid for a maximum of six months.
- Internet Drafts may be updated, replaced, or obsoleted by other documents
- at any time. It is not appropriate to use Internet Drafts as reference
- material or to cite them other than as a ``working draft'' or ``work in
- progress.''
-
- Please check the I-D abstract listing contained in each Internet Draft
- directory to learn the current status of this or any other Internet Draft.
-
- Distribution of this document is unlimited.
-
-
- Abstract
-
-
- This document describes a possible structure of the media content
- for audio and video for Internet applications. The definitions
- are independent of the particular transport mechanism used. The
- descriptions provide pointers to reference implementations and
- the detailed standards. This document is meant as an aid
- for implementors of audio, video and other real-time multimedia
- applications.
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- 1 Audio
-
-
-
- 1.1 Encoding-independent recommendations
-
-
- The following recommendations are default operating parameters. An
- applications should be prepared to handle other values. The ranges
- given are meant to give guidance to application writers, allowing
- a set of applications conforming to these guidelines to interoperate
- without additional negotiation. These guidelines are not intended to
- restrict operating parameters for application that can negotiate a set of
- interoperable parameters, e.g., through a conference control protocol.
-
- For packetized audio, the default packetization interval should have a
- duration of 20 ms, unless otherwise noted in Table 1. The packetization
- interval determines the minimum end-to-end delay; longer packets introduce
- less header overhead but higher delay and make packet loss more noticeable.
- For on-interactive applications such as lectures or links with severe
- bandwidth constraints, a higher packetization delay may be appropriate. For
- frame-based encodings (marked as F in the table 1 below) such as LPC, CELP
- and GSM, the sender may choose to combine several frame intervals into a
- single message to reduce header overhead. The number of frames is single
- packetization interval, however, a sender may choose to combine several
- intervals into a single message. The receiver can tell the number of frames
- contained in a message since the nominal frame duration is defined as part
- of the encoding.
-
- If multiple channels are used, the left channel information always precedes
- the right-channel information. For more than two channels, the convention
- followed by the AIFF-C audio interchange format should be followed. It
- is listed in the table below. (The AIFF-C specification is available by
- anonymous ftp at sgi.sgi.com in the file sgi/aiff-c.9.26.91.ps.)
-
-
-
- type_______channels________________________________________________________________
- stereo left right
- 3 channel left right center
- quad front left front right rear left rear right
- 4 channel left center right surround
- 6 channel left left center center right right center surround
-
-
- The sampling frequency should be drawn from the set: 8, 11.025, 16, 22.05,
- 44.1 and 48 kHz. Preferred rates are 8, 16 and 48 kHz.
-
-
-
-
-
-
- H. Schulzrinne Expires 10/01/93 [Page 2]
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- 1.2 Recommended Audio Encodings
-
-
- The table 1 shows the names, types (sample vs. frame oriented) and default
- sampling frequencies of recommended encodings. The list is partially
- drawn from the document ``Recommended practices for enhancing digital
- audio compatibility in multimedia systems'', published by the Interactive
- Multimedia Assocation, Version 3.00, Oct. 1992 (referenced as [IMA]).
- The names are for identification only; they correspond to the names used
- within the Real-Time Transport Protocol (RTP). Other applications may choose
- different namings.
-
-
-
- name nom. sampling rate type frame description
- __________________kHz___kb/s__S/F___ms___________________________________
- L16 48 705.6 S 16-bit linear, 2's complement
- G722 16 64 S CCITT subband ADPCM
- PCMU 8 64 S CCITT mu-law PCM
- PCMA 8 64 S CCITT A-law PCM
- G721 8 32 S CCITT ADPCM
- DVI 8 32 S Intel/DVI ADPCM [IMA]
- G723 8 24 S CCITT ADPCM
- GSM 8 13 B 20 RTE/LTP GSM 06.10)
- _1016_______________8____4.8__B_____30_____CELP__________________________
-
-
- Table 1: Audio encodings
-
- For multi-octet encodings, octets are transmitted in network byte order
- (i.e., most significant octet first).
-
- A detailed description of the encodings is given below. The names shown
- (L16, PCMU, etc.) are limited to four characters and suitable to be used
- for identification in protocols such as RTP (RFC TBD).
-
-
- L16: denotes uncompressed audio data, using 16-bit signed representation
- with 65535 equally divided steps between minimum and maximum signal
- level, ranging from -32768 to 32767. The value is represented in two's
- complement notation.
-
- PCMU: specified in CCITT recommendation G.711. Audio data is encoded as
- eight bits per sample, after companding. Code to convert between
- linear and mu-law companded data is available in the IMA document.
-
- PCMA: specified in CCITT recommendation G.711. Audio data is encoded as
- eight bits per sample, after companding. Code to convert between
- linear and A-law companded data is available in the IMA document.
-
- G721 through G729: specified in the corresponding CCITT recommendations.
- Reference implementations for G.721 and G.723 are available as part of
-
- H. Schulzrinne Expires 10/01/93 [Page 3]
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- the CCITT Software Tool Library (STL) from the ITU General Secretariat,
- Sales Service, Place du Nations, CH-1211 Geneve 20, Switzerland. The
- library is covered by a license and is available for anonymous ftp on
- gaia.cs.umass.edu, file pub/ccitt/ccitt_tools.tar.Z.
-
- GSM: (group speciale mobile) denotes the European GSM 06.10 provisional
- standard for full-rate speech transcoding, prI-ETS 300 036, which
- is based on RPE/LTP (residual pulse excitation/long term prediction)
- coding at a rate of 13 kb/s. A reference implementation was written by
- Carsten Borman and Jutta Degener (TU Berlin, Germany) and is available
- for anonymous ftp from tub.cs.tu-berlin.de, directory tub/tubmik.
-
- 1016: uses code-excited linear prediction (CELP) and is specified in
- Federal Standard FED-STD 1016, published by the Office of Technology
- and Standards, Washington, DC 20305-2010.
-
- The U. S. DoD's Federal-Standard-1016 based 4800 bps code excited
- linear prediction voice coder version 3.2 (CELP 3.2) Fortran and
- C simulation source codes are available for worldwide distribution
- at no charge (on DOS diskettes, but configured to compile on Sun
- SPARC stations) from: Bob Fenichel, National Communications System,
- Washington, D.C. 20305, phone +1-703-692-2124, fax +1-703-746-4960.
-
- Example input and processed speech files, a technical information
- bulletin, and the official standard ``Federal Standard 1016, Telecom-
- munications: Analog to Digital Conversion of Radio Voice by 4,800
- bit/second Code Excited Linear Prediction (CELP)'' are included at no
- charge. According to Vincent Cate (Carnegie Mellon), the distribution
- is also available for anonymous ftp at furmint.nectar.cs.cmu.edu
- (128.2.209.111) in directory celp.audio.compression.
-
- The following articles describes the Federal-Standard-1016 4.8-kbps
- CELP coder:
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
- Proposed Federal Standard 1016 4800 bps Voice Coder: CELP,'' S_p_e_e_c_h_
- T_e_c_h_n_o_l_o_g_y_ M_a_g_a_z_i_n_e_, April/May 1990, p. 58-64.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
- Federal Standard 1016 4800 bps CELP Voice Coder,'' D_i_g_i_t_a_l_ S_i_g_n_a_l_
- P_r_o_c_e_s_s_i_n_g_, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
- DoD 4.8 kbps Standard (Proposed Federal Standard 1016),'' in A_d_v_a_n_c_e_s_
- i_n_ S_p_e_e_c_h_ C_o_d_i_n_g_, ed. Atal, Cuperman and Gersho, Kluwer Academic
- Publishers, 1991, Chapter 12, p. 121-133.
-
- Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, ``The
- Proposed Federal Standard 1016 4800 bps Voice Coder: CELP,'' S_p_e_e_c_h_
- T_e_c_h_n_o_l_o_g_y_ M_a_g_a_z_i_n_e_, April/May 1990, p. 58-64.
-
-
-
- H. Schulzrinne Expires 10/01/93 [Page 4]
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- Copies of the FS-1016 document are available for $2.50 each from:
-
-
- GSA Rm 6654
- 7th & D St SW
- Washington, D.C. 20407
- 1-202-708-9205
-
-
- DVI: is specified in the ``Recommended Practices for Enhancing Digital
- Audio Compatibility in Multimedia Systems'', published by the
- Interactive Multimedia Association (IMA), Annapolis, MD. The document
- also contains reference implementations for mu-law to 16-bit, ADPCM and
- sample rate conversions.
-
-
-
- For sample-based encodings, a receiver should accept packets representing
- between 0 and 200 ms of audio data.(1) Receivers should be prepared to
- accept multi-channel audio, but may choose to only play a single channel.
-
-
- 1.3 Application Programming Interface for Audio Codecs
-
-
- The application programming interface (API) for audio codecs described here
- is suggested, but not required for interoperability. The API shown here is
- similar to the one used by SunOS 4.1. The encoding types are drawn from the
- standard names defined here.
-
-
- typedef {AE_PCMU = 1, AE_PCMA, AE_L16} encoding_t;
-
- typedef struct {
- unsigned sample_rate; /* samples per second */
- unsigned samples_per_unit; /* samples per unit */
- unsigned bytes_per_unit; /* bytes per sample unit */
- unsigned channels; /* # of interleaved channels */
- encoding_t encoding; /* data encoding format */
- unsigned data_size; /* length of data (optional) */
- } audio_descr_t;
-
- void *x_init(void *state, double period);
-
- int x_encode(void *in_buf, int in_size, audio_descr_t *in_descr,
- void *out_buf, int *out_size, void *state);
- int x_decode(void *in_buf, int in_size, audio_descr_t *out_descr,
- void *out_buf, int *out_size, void *state);
-
- ------------------------------
- 1. This restriction allows reasonable buffer sizing for the receiver.
-
-
- H. Schulzrinne Expires 10/01/93 [Page 5]
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- x_init initializes a particular instance of a codec. If the argument state
- is zero, a memory area sufficient to hold the encoder or decoder state is
- allocated; if that argument is non-zero, the existing area is reinitialized.
- The function returns a pointer to the area, zero if the state area could
- not be allocated. The argument period refers to the amount of audio
- data in each block, measured in seconds. It is typically only used for
- block-oriented codecs.
-
- The generic pointer to state refers to an area of storage whose structure is
- opaque to the application program. In the functions, 'x' is replaced by the
- appropriate codec name, appropriately modified to conform to C syntax (e.g.,
- g711, g721, etc).
-
- The encoder and decoder transform the data contained in the input buffer
- in_buf (in_size bytes) and deposit the result into the output buffer area
- out_buf. The variable out_size is set to the number of bytes actually
- contained in the output buffer. The ah arguments points to a structure of
- type audio_hdr_t, which defines the given input data format for the encoder
- and the desired output data format for the decoder. The functions return 0
- on success, a negative number if a failure occurred.
-
- All block-oriented audio codecs should be able to encode and decode several
- consecutive blocks.
-
-
-
- 2 Video
-
-
- The following video encodings are defined, with their abbreviated names used
- for identification:
-
-
- Bolt: The encoding is implemented by the Bolter video codec [ED: need more
- info on company, designation].
-
- JPEG: The encoding is specified in ISO Standard TBD. The data is formatted
- according to TBD.
-
- H261: The encoding is specified in ITU (formerly CCITT) standard H.261.
- The packetization and RTP-specific properties are described in RFC TBD.
-
- nv: The encoding is implemented in the program 'nv' developed at Xerox PARC
- by Ron Frederick.
-
- CUSM: The encoding is implemented in the program CU-SeeMe developed at
- Cornell University by Dick Cogger, Scott Brim, Tim Dorcey and John
- Lynn.
-
- dvc: The encoding is implemented in the program PictureWindow developed at
- Bolt, Beranek and Newman (BBN).
-
-
- H. Schulzrinne Expires 10/01/93 [Page 6]
-
-
- INTERNET-DRAFT Media July 14, 1993
-
- 3 Address of Author
-
-
- Henning Schulzrinne
- AT&T Bell Laboratories
- MH 2A244
- 600 Mountain Avenue
- Murray Hill, NJ 07974
- telephone: +1 908 582 2262
- electronic mail: hgs@research.att.com
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- H. Schulzrinne Expires 10/01/93 [Page 7]
-